Letter-to-Phoneme Conversion Based on Two-Stage Neural Network Focusing on Letter and Phoneme Contexts

نویسندگان

  • Kheang Seng
  • Yurie Iribe
  • Tsuneo Nitta
چکیده

The improvement of Letter-To-Phoneme (L2P) conversion that can output the phoneme strings corresponding to Out-OfVocabulary (OOV) words, especially in English language, has become one of the most important issues in Text-To-Speech (TTS) research. In this paper, we propose a Two-Stage Neural Network (NN) based approach to solve the problem of conflicting output at a phonemic level. Both Letter and Phoneme Context-Dependent models are combined and implemented in the first-stage NN to convert several letters into several phonemes. Then, the second-stage NN can predict the final output phoneme by observing on a combination of several consecutive phoneme sequences that obtained from the first-stage NN. Therefore, our L2P conversion module takes a sequence of letters as input and outputs only one phoneme at each time. By focusing mainly on the result of word accuracy of OOV words, this new approach usually provides a higher performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Solving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach

To achieve high quality output speech synthesis systems, data-driven grapheme-to-phoneme (G2P) conversion is usually used to generate the phonetic transcription of out-of-vocabulary (OOV) words. To improve the performance of G2P conversion, this paper deals with the problem of conflicting phonemes, where an input grapheme can, in the same context, produce many possible output phonemes at the sa...

متن کامل

Comparative Evaluation of Letter-to-sound Conversion Techniques for English

Dictionary look-up is the primary strategy for deriving pronunciations for input words in a text-to-speech (TTS) system. This strategy is accurate for dictionary words, but it is not complete: it is impossible to list exhaustively all input words. The proper treatment ofùnknown' words is currently an unsolved problem in TTS synthesis. There are many competing techniques for letter-to-sound conv...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Comparison of two tree-structured approaches for grapheme-to-phoneme conversion

Recently, we described a two-step self-learning approach for grapheme-to-phoneme (G2P) conversion [1]. In the first step, grapheme and phoneme strings in the training data are aligned via an iterative Viterbi procedure that may insert graphemic and phonemic nulls where required. In the second step, a Trie structure encoding pronunciation rules is generated. In this paper we describe the alignme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011